Goto

Collaborating Authors

 anonymization process




FNDEX: Fake News and Doxxing Detection with Explainable AI

arXiv.org Artificial Intelligence

The widespread and diverse online media platforms and other internet-driven communication technologies have presented significant challenges in defining the boundaries of freedom of expression. Consequently, the internet has been transformed into a potential cyber weapon. Within this evolving landscape, two particularly hazardous phenomena have emerged: fake news and doxxing. Although these threats have been subjects of extensive scholarly analysis, the crossroads where they intersect remain unexplored. This research addresses this convergence by introducing a novel system. The Fake News and Doxxing Detection with Explainable Artificial Intelligence (FNDEX) system leverages the capabilities of three distinct transformer models to achieve high-performance detection for both fake news and doxxing. To enhance data security, a rigorous three-step anonymization process is employed, rooted in a pattern-based approach for anonymizing personally identifiable information. Finally, this research emphasizes the importance of generating coherent explanations for the outcomes produced by both detection models. Our experiments on realistic datasets demonstrate that our system significantly outperforms the existing baselines


CAT-Walk: Inductive Hypergraph Learning via Set Walks

arXiv.org Artificial Intelligence

Temporal hypergraphs provide a powerful paradigm for modeling time-dependent, higher-order interactions in complex systems. Representation learning for hypergraphs is essential for extracting patterns of the higher-order interactions that are critically important in real-world problems in social network analysis, neuroscience, finance, etc. However, existing methods are typically designed only for specific tasks or static hypergraphs. We present CAt-Walk, an inductive method that learns the underlying dynamic laws that govern the temporal and structural processes underlying a temporal hypergraph. CAt-Walk introduces a temporal, higher-order walk on hypergraphs, SetWalk, that extracts higher-order causal patterns. CAt-Walk uses a novel adaptive and permutation invariant pooling strategy, SetMixer, along with a set-based anonymization process that hides the identity of hyperedges. Finally, we present a simple yet effective neural network model to encode hyperedges. Our evaluation on 10 hypergraph benchmark datasets shows that CAt-Walk attains outstanding performance on temporal hyperedge prediction benchmarks in both inductive and transductive settings. It also shows competitive performance with state-of-the-art methods for node classification.


Comparison of machine learning models applied on anonymized data with different techniques

arXiv.org Artificial Intelligence

Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $\ell$-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of k for k-anonymity and additional tools such as $\ell$-diversity, t-closeness and $\delta$-disclosure privacy are also deployed on the well-known adult dataset.


What is Data Anonymization?

#artificialintelligence

Data anonymization is the process of mitigating direct and indirect privacy risks within data, such that there is a measurable way to ensure records cannot be attributed to a specific individual or entity. With an estimated 2.5 quintillion bytes of data being generated every day and an increasing reliance on data to power new applications, machine learning models and AI technologies, the importance of implementing effective anonymization techniques and removing any bottlenecks is crucial to accelerating future developments and innovations. This post is a general introduction to anonymization, and the tools and techniques for providing sufficient privacy protections, so that personally identifiable information (PII) is safe from exposure and exploitation. Data anonymization should be considered a continuous process; one that can require rapid iteration of applying various privacy engineering techniques and then measuring those privacy outcomes until a desired end state is reached. In the following sections, we'll dive deeper into our core tenets of the data anonymization process, and then walkthrough how you might apply them to a notional dataset.